Queue Locks on Cache Coherent Multiprocessors

نویسندگان

  • Peter S. Magnusson
  • Anders Landin
  • Erik Hagersten
چکیده

Large-scale shared-memory multiprocessors typically have long latencies for remote data accesses. A key issue for execution performance of many common applications is the synchronization cost. The communication scalability of synchronization has been improved by the introduction of queue-based spin-locks instead of Test&(Test&Set). For architectures with long access latencies for global data, attention should also be paid to the number of global accesses that are involved in synchronization. We present a method to characterize the performance of proposed queue lock algorithms, and apply it to previously published algorithms. We also present two new queue locks, the LH lock and the M lock. We compare the locks in terms of performance, memory requirements, code size, and required hardware support. The LH lock is the simplest of all the locks, yet requires only an atomic swap operation. The M lock is superior in terms of global accesses needed to perform synchronization and still competitive in all other criteria. We conclude that the M lock is the best overall queue lock for the class of architectures studied.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Eecient Software Synchronization on Large Cache Coherent Multiprocessors

Large-scale shared-memory multiprocessors typically have long latencies for remote data accesses. A key issue for execution performance of many common applications is the synchronization cost. The communication scalability of synchronization has been improved by the introduction of queue-based spin-locks instead of Test&(Test&Set). For architectures with long access latencies for global data, a...

متن کامل

Parallel Hierarchical Radiosity On Cache-Coherent Multiprocessors

Computing radiosity is a computationally very expensive problem in computer graphics. Recent hierarchical methods have greatly speeded up the computation of first diffuse and now also specular radiosity. We present a parallel algorithm for computing both diffuse and specular radiosity together, and examine its performance in detail on cache-coherent shared address space multiprocessors. We comp...

متن کامل

Abortable Reader-Writer Locks Are No More Complex Than Abortable Mutex Locks

When a process attempts to acquire a mutex lock, it may be forced to wait if another process currently holds the lock. In certain applications, such as real-time operating systems and databases, indefinite waiting can cause a process to miss an important deadline [20]. Hence, there has been research on designing abortable mutual exclusion locks, and fairly efficient algorithms of O(log n) RMR c...

متن کامل

Fast Synchronization on Scalable Cache-Coherent Multiprocessors using Hybrid Primitives

This paper presents a new methodology for implementing fast synchronization on scalable cache-coherent multiprocessors, through the use of hybrid primitives. Hybrid primitives leverage commodity hardware to speed-up the execution of the atomic remote Read-Modify-Write (RMW) instructions employed in synchronization algorithms to resolve contending processors, while exploiting the caches to reduc...

متن کامل

A Hierarchical CLH Queue Lock

Modern multiprocessor architectures such as CC-NUMAmachines or CMPs have nonuniform communication architectures that render programs sensitive to memory access locality. A recent paper by Radović and Hagersten shows that performance gains can be obtained by developing general-purpose mutual-exclusion locks that encourage threads with high mutual memory locality to acquire the lock consecutively...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994